Getting duplicates in dict - python-2.7

I have a dictionary, say d1 that looks like this:
d = {'file1': 4098, 'file2': 4139, 'file3': 4098, 'file4': 1353, 'file5': 4139}
Now, I've figured out how to get it to tell me if there are any dublicates or not. But what I'd like to get it to do is tell me if there are any, and what 2 (or more) values (and corresponding keys) are dublicates.
The output for the above would tell me that file1 and file3 are identical and that file2 and file5 are identical
I've been trying to wrap my head around it for a few hours, and haven't found the right solution yet.

try this to get the duplicates:
[item for item in d.items() if [val for val in d.values()].count(item[1]) > 1]
that outputs:
[('file3', 4098), ('file2', 4139), ('file1', 4098), ('file5', 4139)]
next sort the list by the second item in the tuple:
list = sorted(list, key=operator.itemgetter(1))
finally use itertools.groupby() to group by the second item:
list = [list(group) for key, group in itertools.groupby(list, operator.itemgetter(1))]
final output:
[[('file3', 4098), ('file1', 4098)], [('file2', 4139), ('file5', 4139)]]

Related

How to display interaction element with order way of list one in Kotlin

I have two lists and I want to return a result in the following way:
the result should contain elements that are in list one and list two
output should be same order as per first list
Input :
val first = listOf(1, 2, 3, 4, 5,7,9,15,11)
val second = listOf(2, 15 , 4,3, 11)
Output:
val output = listOf(2,3,4,15,11)
Please help me to learn how to get common values in both lists in order of list first in Kotlin.
You can do
val output = first.filter { second.contains(it) }
What you are looking for is the intersection of the two lists:
val output = first.intersect(second)
As pointed out by #Ivo the result is a Set which can be turned into a list with output.toList(). However, since the result is a set, it contains no duplicates, e.g. if first is listOf(1,2,3,1,2,3) and second is listOf(2,4,2,4), the result will be equal to setOf(2).
If this is not acceptable, the solution of #Ivo should be used instead.

Python, FOR looping - creating lists

This is my code to create lists, but its so brutal and inelegant, you guys have some idea to make it much smoother?
Thing is, I want to write code, where you could create your own lists, choose how many of them you want to create and how much items each should have - NOT using while loop. I can manage creating certain number of lists by inputing the range in for loop (number_of_lists)
i = 0
number_of_lists = input('How many lists you want to make? >')
for cycle in range(number_of_lists): #this was originaly range(3),
item1 = raw_input('1. item > ') #and will only work now pro-
item2 = raw_input('2. item > ') #perly, if n_o_l is exact. 3
item3 = raw_input('3. item > ')
#everything is wrong with this
print "-------------------" #code, i need it much more au-
#tonomous, than it is now.
if i == 0:
list1 = [item1, item2, item3]
if i == 1:
list2 = [item1, item2, item3]
if i == 2:
list3 = [item1, item2, item3]
i += 1
print list1
print list2
print list3
Thing is I also want to avoid all that 'if i == int' thing.
Now it will only create 3 lists, right, because instead of number_of_lists i originally used integer 3 to make 3 lists.
Now you see my problem I hope. I need to create new lists from input and name them if possible, so instead of list1 i can name it DOGS or w/e.
I need it all much more simple and interconnected, I hope you understand my problem and maybe have some smooth solution, thanks :)
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Ok, I think I got it now - this is new version, doing pretty much what i want it to do:
number_of_lists = input('How many lists you want to make? >')
allItems = []
for cycle in range(int(number_of_lists)):
items = []
number_of_items = input('How much items in this list? >')
for i in range(int(number_of_items)):
item = raw_input(str(i+1) + ". item > ")
items.append(item)
allItems.append(items)
print("-------------------")
print allItems
If anyone has idea how to make this more effective and clear, let me know here! :) thanks for help guyz
You can add your lists to another list, that way it's dynamic like you want. Example below:
number_of_lists = input('How many lists you want to make? >')
allItems = []
for cycle in range(int(number_of_lists)):
items = []
for i in range(1, 4):
item = input(str(i) + ".item > ")
items.append(item)
allItems.append(items)
print("-------------------")
for items in allItems:
for item in items:
print(item)
print("-------------")
You'd still need to check if number_of_lists is an int before parsing it into an int. If the user types a letter it will throw an error.

Python keeps printing first index of for loop

I have data in two directories and i'm using for loop to read the files from both the folders.
path_to_files = '/home/Desktop/computed_2d/'
path_to_files1 = '/home/Desktop/computed_1d/'
for filen in [x for x in os.listdir(path_to_files) if '.ares' in x]:
df = pd.read_table(path_to_files+filen, skiprows=0, usecols=(0,1,2,3,4,8),names=['wave','num','stlines','fwhm','EWs','MeasredWave'],delimiter=r'\s+')
for filen1 in [x for x in os.listdir(path_to_files1) if '.ares' in x]:
df1 = pd.read_table(path_to_files1+filen1, skiprows=0, usecols=(0,1,2,3,4,8),names=['wave','num','stlines','fwhm','EWs','MeasredWave'],delimiter=r'\s+')
print(filen,filen1)
Now what's happening is like when tried to print the filenames then it kept printing the names forever. So, its basically taking the first iteration from first loop then print it with all the iteration of the second loop.I don't understand why is it happening.
But what i want to do is, i want to print the first iteration of first loop with the first iteration of second for loop
As the file names are same in both the folders.So when i do the print, then desired result should look like something like this:
(txt_1.txt,txt_1.txt)
(txt_2.txt,txt_2.txt)
(txt_3.txt,txt_3.txt)
(txt_4.txt,txt_4.txt)
Where i'm making the mistake??
If I understand your question correctly, you seem to want to print pairs of files from path_to_files and path_to_files1. Since you are nesting a for loop, for every iteration of the nested for loop, filen is not going to change.
I think you might want something more like this:
path_to_files = '/home/Desktop/computed_2d/'
path_to_files1 = '/home/Desktop/computed_1d/'
filelistn = [x for x in os.listdir(path_to_files) if '.ares' in x]
filelist1 = [x for x in os.listdir(path_to_files1) if '.ares' in x]
for filen, filen1 in zip(filelistn, filelist1):
df = pd.read_table(path_to_files+filen, skiprows=0, usecols=(0,1,2,3,4,8),names=['wave','num','stlines','fwhm','EWs','MeasredWave'],delimiter=r'\s+')
df1 = pd.read_table(path_to_files1+filen1, skiprows=0, usecols=(0,1,2,3,4,8),names=['wave','num','stlines','fwhm','EWs','MeasredWave'],delimiter=r'\s+')
print(filen,filen1)
For a sample input of:
filelistn = ['a.ar', 'b.ar']
filelist1 = ['c.ar', 'd.ar']
I get the following output:
('a.ar', 'c.ar')
('b.ar', 'd.ar')

Python: referring to each duplicate item in a list by unique index

I am trying to extract particular lines from txt output file. The lines I am interested in are few lines above and few below the key_string that I am using to search through the results. The key string is the same for each results.
fi = open('Inputfile.txt')
fo = open('Outputfile.txt', 'a')
lines = fi.readlines()
filtered_list=[]
for item in lines:
if item.startswith("key string"):
filtered_list.append(lines[lines.index(item)-2])
filtered_list.append(lines[lines.index(item)+6])
filtered_list.append(lines[lines.index(item)+10])
filtered_list.append(lines[lines.index(item)+11])
fo.writelines(filtered_list)
fi.close()
fo.close()
The output file contains the right lines for the first record, but multiplied for every record available. How can I update the indexing so it can read every individual record? I've tried to find the solution but as a novice programmer I was struggling to use enumerate() function or collections package.
First of all, it would probably help if you said what exactly goes wrong with your code (a stack trace, it doesn't work at all, etc). Anyway, here's some thoughts. You can try to divide your problem into subproblems to make it easier to work with. In this case, let's separate finding the relevant lines from collecting them.
First, let's find the indexes of all the relevant lines.
key = "key string"
relevant = []
for i, item in enumerate(lines):
if item.startswith(key):
relevant.append(item)
enumerate is actually quite simple. It takes a list, and returns a sequence of (index, item) pairs. So, enumerate(['a', 'b', 'c']) returns [(0, 'a'), (1, 'b'), (2, 'c')].
What I had written above can be achieved with a list comprehension:
relevant = [i for (i, item) in enumerate(lines) if item.startswith(key)]
So, we have the indexes of the relevant lines. Now, let's collected them. You are interested in the line 2 lines before it and 6 and 10 and 11 lines after it. If your first lines contains the key, then you have a problem – you don't really want lines[-1] – that's the last item! Also, you need to handle the situation in which your offset would take you past the end of the list: otherwise Python will raise an IndexError.
out = []
for r in relevant:
for offset in -2, 6, 10, 11:
index = r + offset
if 0 < index < len(lines):
out.append(lines[index])
You could also catch the IndexError, but that won't save us much typing, as we have to handle negative indexes anyway.
The whole program would look like this:
key = "key string"
with open('Inputfile.txt') as fi:
lines = fi.readlines()
relevant = [i for (i, item) in enumerate(lines) if item.startswith(key)]
out = []
for r in relevant:
for offset in -2, 6, 10, 11:
index = r + offset
if 0 < index < len(lines):
out.append(lines[index])
with open('Outputfile.txt', 'a') as fi:
fi.writelines(out)
To get rid of duplicates you can cast list to set; example:
x=['a','b','a']
y=set(x)
print(y)
will result in:
['a','b']

removing cyclic substrings from python list

I have a Python list like the following:
['IKW', 'IQW', 'IWK', 'IWQ', 'KIW', 'KLW', 'KWI', 'KWL', 'LKW', 'LQW', 'LWK', 'LWQ', 'QIW', 'QLW', 'QWI', 'QWL', 'WIK', 'WIQ', 'WKI', 'WKL', 'WLK', 'WLQ', 'WQI', 'WQL']
If we pick, say the second element IQW, we see that the list has duplicates of this item HOWEVER its not noticeable right away. This is because it is cyclic. I mean the following are equivalent.
IQW, QWI, WIQ
Also it could be backwards which is also a duplicate so I want it removed. So now the list of duplicates are (the reverse of each of one these)
IQW, QWI, WIQ , WQI, IWQ, QIW
So essentially I would like IQW to be the only one left.
Bonus points, if the one that is remaining in the list is sorted alphabetically.
The way I did was to sort the entire list by alphabetical order:
`IQW`, `QWI`, `WIQ` , `WQI`, `IWQ`, `QIW` ->
`IQW`, `IQW`, `IQW`, `IQW`, `IQW` `IQW`
and then remove the duplicates.
However this also removes combinations say i have ABCD and CDAB. These are not the same because the ends only meet once. But my method will sort them to ABCD and ABCD and remove one.
My code:
print cur_list
sortedlist = list()
for i in range(len(cur_list)):
sortedlist.append(''.join(map(str, sorted(cur_list[i]))))
sortedlist = set(sortedlist)
L = ['IKW', 'IQW', 'IWK', 'IWQ', 'KIW', 'KLW', 'KWI', 'KWL', 'LKW', 'LQW', 'LWK', 'LWQ', 'QIW', 'QLW', 'QWI', 'QWL', 'WIK', 'WIQ', 'WKI', 'WKL', 'WLK', 'WLQ', 'WQI', 'WQL']
seen = set()
res = []
for item in L:
c = item.index(min(item))
item = item[c:] + item[:c]
if item not in seen:
seen.add(item)
seen.add(item[0]+item[-1:0:-1])
res.append(item)
print res
output:
['IKW', 'IQW', 'KLW', 'LQW']
Here is the solution I coded: If anyone has a better algo, I will accept that as answer:
mylist = list()
for item in copy_of_cur:
linear_peptide = item+item
mylist = filter(lambda x: len(x) == 3 , subpeptides_linear(linear_peptide))
for subitem in mylist:
if subitem != item:
if subitem in cur_list:
cur_list.remove(subitem)