I have an input file where each line is a special record.
I would gladly work on the file level but might be a more convenient way to transfer the file into a list. (each object in the list = each row in the file)
In the input file, there can be several duplicate rows.
The goal: Split the given file/list into unique records and duplicate records, i.e., Records which are present multiple times, keep one occurrence and other duplicate parts store in a new list
I found an easy way how to remove duplicates but never found a way how to store them
File inputFile = new File("....")
inputFile.eachLine { inputList.add(it) } //fill the list
List inputList = [1,1,3,3,1,2,2,3,4,1,5,6,7,7,8,9,8,10]
inputList = inputList.unique() // remove duplicates
println inputList
// inputList = [1, 3, 2, 4, 5, 6, 7, 8, 9, 10]
The output should look like: Two lists/files with removed duplicates and duplicates itself
inputList = [1,3,2,4,5,6,7,8,9,10] //only one ocurance of each line
listOfDuplicates = [1,1,1,3,3,2,7,8] //duplicates removed from original list
The output does not need to correspond with the initial order of items.
Thank you for help, Matt
You could simply iterate over the list yourself:
def inputList = [1,1,3,3,1,2,2,3,4,1,5,6,7,7,8,9,8,10]
def uniques = []
def duplicates = []
inputList.each { uniques.contains(it) ? duplicates << it : uniques << it }
assert inputList.size() == uniques.size() + duplicates.size()
assert uniques == [1,3,2,4,5,6,7,8,9,10] //only one ocurance of each line
assert duplicates == [1,3,1,2,3,1,7,8] //duplicates removed from original list
inputList = uniques // if desired
There are many ways to do this,following is the simplest way
def list = [1,1,3,3,1,2,2,3,4,1,5,6,7,7,8,9,8,10]
def unique=[]
def duplicates=[]
list.each {
if(unique.contains(it))
duplicates.add(it)
else
unique.add(it)
}
println list //[1, 1, 3, 3, 1, 2, 2, 3, 4, 1, 5, 6, 7, 7, 8, 9, 8, 10]
println unique //[1, 3, 2, 4, 5, 6, 7, 8, 9, 10]
println duplicates //[1, 3, 1, 2, 3, 1, 7, 8]
Hope this will helps you
Something very straight-forward:
List inputList = [1,1,3,3,1,2,2,3,4,1,5,6,7,7,8,9,8,10]
def uniques = [], duplicates = []
Iterator iter = inputList.iterator()
iter.each{
iter.remove()
inputList.contains( it ) ? ( duplicates << it ) : ( uniques << it )
}
assert [2, 3, 4, 1, 5, 6, 7, 9, 8, 10] == uniques
assert [1,1,3,3,1,2,7,8] == duplicates
If order of duplicates isn't important:
def list = [1,1,3,3,1,2,2,3,4,1,5,6,7,7,8,9,8,10]
def (unique, dups) = list.groupBy().values()*.with{ [it[0..0], tail()] }.transpose()*.sum()
assert unique == [1,3,2,4,5,6,7,8,9,10]
assert dups == [1,1,1,3,3,2,7,8]
This code should solve the problem
List listOfDuplicates = inputList.clone()
listOfDuplicates.removeAll{
listOfDuplicates.count(it) == 1
}
The more the merrier:
groovy:000> list.groupBy().values()*.tail().flatten()
===> [1, 1, 1, 3, 3, 2, 7, 8]
Group by identity (this is basically a "frequencies" function).
Take just the values
Clip the first element
Combine the lists
I have a list of numbers, let's say :
my_list = [2, 4, 3, 8, 1, 1]
From this list, I want to obtain a new list. This list would start with the maximum value until the end, and I want the first part (from the beginning until just before the maximum) to be added, like this :
my_new_list = [8, 1, 1, 2, 4, 3]
(basically it corresponds to a horizontal graph shift...)
Is there a simple way to do so ? :)
Apply as many as you want,
To the left:
my_list.append(my_list.pop(0))
To the right:
my_list.insert(0, my_list.pop())
How about something like this:
max_idx = my_list.index(max(my_list))
my_new_list = my_list[max_idx:] + my_list[0:max_idx]
Alternatively you can do something like the following,
def shift(l,n):
return itertools.islice(itertools.cycle(l),n,n+len(l))
my_list = [2, 4, 3, 8, 1, 1]
list(shift(my_list, 3))
Elaborating on Yasc's solution for moving the order of the list values, here's a way to shift the list to start with the maximum value:
# Find the max value:
max_value = max(my_list)
# Move the last value from the end to the beginning,
# until the max value is the first value:
while my_list[0] != max_value:
my_list.insert(0, my_list.pop())
When iterating on a 2d array, how can I get the current row index? For example:
x = [[ 1. 2. 3. 4.]
[ 5. 6. 7. 8.]
[ 9. 0. 3. 6.]]
Something like:
for rows in x:
print x current index (for example, when iterating on [ 5. 6. 7. 8.], return 1)
Enumerate is a built-in function of Python. It’s usefulness can not be summarized in a single line. Yet most of the newcomers and even some advanced programmers are unaware of it. It allows us to loop over something and have an automatic counter. Here is an example:
for counter, value in enumerate(some_list):
print(counter, value)
And there is more! enumerate also accepts an optional argument which makes it even more useful.
my_list = ['apple', 'banana', 'grapes', 'pear']
for c, value in enumerate(my_list, 1):
print(c, value)
.
# Output:
# 1 apple
# 2 banana
# 3 grapes
# 4 pear
The optional argument allows us to tell enumerate from where to start the index. You can also create tuples containing the index and list item using a list. Here is an example:
my_list = ['apple', 'banana', 'grapes', 'pear']
counter_list = list(enumerate(my_list, 1))
print(counter_list)
.
# Output: [(1, 'apple'), (2, 'banana'), (3, 'grapes'), (4, 'pear')]
enumerate:
In [42]: x = [[ 1, 2, 3, 4],
...: [ 5, 6, 7, 8],
...: [ 9, 0, 3, 6]]
In [43]: for index, rows in enumerate(x):
...: print('current index {}'.format(index))
...: print('current row {}'.format(rows))
...:
current index 0
current row [1, 2, 3, 4]
current index 1
current row [5, 6, 7, 8]
current index 2
current row [9, 0, 3, 6]
Say I have a dict of country -> [cities] (potentially an ordered dict):
{'UK': ['Bristol', 'Manchester' 'London', 'Glasgow'],
'France': ['Paris', 'Calais', 'Nice', 'Cannes'],
'Germany': ['Munich', 'Berlin', 'Cologne']
}
The number of keys (countries) is variable: and the number of elements cities in the array, also variable. The resultset comes from a 'search' on city name so, for example, a search on "San%" could potentially meet with 50k results (on a worldwide search)
The data is to be used to populate a select2 widget --- and I'd like to use its paging functionality...
Is there a smart way to slice this such that [3:8] would yield:
{'UK': ['Glasgow'],
'France': ['Paris', 'Calais', 'Nice', 'Cannes'],
'Germany': ['Munich']
}
(apologies for the way this question was posed earlier -- I wasn't sure that the real usage would clarify the issue...)
If I understand your problem correctly, as talked about in the comments, this should do it
from pprint import pprint
def slice_dict(d,a, b):
big_list = []
ret_dict = {}
# Make one big list of all numbers, tagging each number with the key
# of the dict they came from.
for k, v in d.iteritems():
for n in v:
big_list.append({k:n})
# Slice it
sliced = big_list[a:b]
# Put everything back in order
for k, v in d.iteritems():
for subd in sliced:
for subk, subv in subd.iteritems():
if k == subk:
if k in ret_dict:
ret_dict[k].append(subv)
else:
ret_dict[k] = [subv]
return ret_dict
d = {
'a': [1, 2, 3, 4],
'b': [5, 6, 7, 8, 9],
'c': [10, 11, 12, 13, 14]
}
x = slice_dict(d, 3, 11)
pprint(x)
$ python slice.py
{'a': [4], 'b': [5, 6], 'c': [10, 11, 12, 13, 14]}
The output is a little different from your example output, but that's because the dict was not ordered when it was passed to the function. It was a-c-b, that's why b is cut off at 6 and c is not cut off