All about list in python3 - list

My code is:
a = 3.5
list = [ 1, 2, 3, 4, 5, 6, 7, 8, 9 ]
bigList = []
smallList = []
for i in list:
if a < i:
bigList.append( i )
else:
smallList.append( i )
print( min( bigList ) )
print( max( smallList ) )
My question: Is there a function or a smarter way to get the smaller value and the greater value than a ( example: if a = 6.1, a smaller value will be 6 and a greater value will be 7) without creating two new lists as I did?
Thank you so much.

You can use bisect if the list is sorted:
import bisect
insertion_point = bisect.bisect_right(lst, a)
small = lst[max(insertion_point - 1, 0)]
big = lst[min(insertion_point, len(lst) - 1)]
This will always return a value in small and big closest to the relative insertion point.
About bisect_right:
returns an insertion point which comes after (to the right of) any existing entries of item in list.
Note: you shouldn't shadow list which is part of python. Use lst or a similar name.

Related

How to add list to heapq in python

How can i add the input list to heap directly?,Where some of the inbuild function used to push,get min,extract min but how to extract the maximum from the heap.
some functions like..
heapify(iterable) :- This function is used to convert the iterable into a heap data structure. i.e. in heap order.
heappush(heap, ele) :- This function is used to insert the element mentioned in its arguments into heap. The order is adjusted, so as heap structure is maintained.
heappop(heap) :- This function is used to remove and return the smallest element from heap. The order is adjusted, so as heap structure is maintained.
heap = []
heapify(heap)
heappush(heap, 10)
heappush(heap, 30)
heappush(heap, 20)
heappush(heap, 400)
# printing the elements of the heap
for i in heap:
print( i, end = ' ')
print("\n")
import heapq
heap = [] # creates an empty heap
item = [20, 4, 8, 10, 5, 7, 6, 2, 9]
for i in item:
heapq.heappush(heap, i) # pushes a new item on the heap
print('Heap obtained from heappush() : ', heap)
heapq.heapify(item) # transforms list into a heap, in-place, in linear time
print('Heap obtained from heapify() : ', item)
And for maxheap
heapq implements function with suffix _max example : _heapify_max, _heapreplace_max, etc.
from _heapq import _heappop_max, _heapify_max, _heapreplace_max
a = [20, 4, 8, 10, 5, 7, 6, 2, 9]
_heapify_max(a)
print('Heap obtained from _heappop_max() : ', a)
Or you can multiple the list with -1 and use minheap itself.
Then 100 becomes -100, 5 becomes -5, etc.
I hope this helps.

Finding duplicates in a list/file. [Groovy/Java]

I have an input file where each line is a special record.
I would gladly work on the file level but might be a more convenient way to transfer the file into a list. (each object in the list = each row in the file)
In the input file, there can be several duplicate rows.
The goal: Split the given file/list into unique records and duplicate records, i.e., Records which are present multiple times, keep one occurrence and other duplicate parts store in a new list
I found an easy way how to remove duplicates but never found a way how to store them
File inputFile = new File("....")
inputFile.eachLine { inputList.add(it) } //fill the list
List inputList = [1,1,3,3,1,2,2,3,4,1,5,6,7,7,8,9,8,10]
inputList = inputList.unique() // remove duplicates
println inputList
// inputList = [1, 3, 2, 4, 5, 6, 7, 8, 9, 10]
The output should look like: Two lists/files with removed duplicates and duplicates itself
inputList = [1,3,2,4,5,6,7,8,9,10] //only one ocurance of each line
listOfDuplicates = [1,1,1,3,3,2,7,8] //duplicates removed from original list
The output does not need to correspond with the initial order of items.
Thank you for help, Matt
You could simply iterate over the list yourself:
def inputList = [1,1,3,3,1,2,2,3,4,1,5,6,7,7,8,9,8,10]
def uniques = []
def duplicates = []
inputList.each { uniques.contains(it) ? duplicates << it : uniques << it }
assert inputList.size() == uniques.size() + duplicates.size()
assert uniques == [1,3,2,4,5,6,7,8,9,10] //only one ocurance of each line
assert duplicates == [1,3,1,2,3,1,7,8] //duplicates removed from original list
inputList = uniques // if desired
There are many ways to do this,following is the simplest way
def list = [1,1,3,3,1,2,2,3,4,1,5,6,7,7,8,9,8,10]
def unique=[]
def duplicates=[]
list.each {
if(unique.contains(it))
duplicates.add(it)
else
unique.add(it)
}
println list //[1, 1, 3, 3, 1, 2, 2, 3, 4, 1, 5, 6, 7, 7, 8, 9, 8, 10]
println unique //[1, 3, 2, 4, 5, 6, 7, 8, 9, 10]
println duplicates //[1, 3, 1, 2, 3, 1, 7, 8]
Hope this will helps you
Something very straight-forward:
List inputList = [1,1,3,3,1,2,2,3,4,1,5,6,7,7,8,9,8,10]
def uniques = [], duplicates = []
Iterator iter = inputList.iterator()
iter.each{
iter.remove()
inputList.contains( it ) ? ( duplicates << it ) : ( uniques << it )
}
assert [2, 3, 4, 1, 5, 6, 7, 9, 8, 10] == uniques
assert [1,1,3,3,1,2,7,8] == duplicates
If order of duplicates isn't important:
def list = [1,1,3,3,1,2,2,3,4,1,5,6,7,7,8,9,8,10]
def (unique, dups) = list.groupBy().values()*.with{ [it[0..0], tail()] }.transpose()*.sum()
assert unique == [1,3,2,4,5,6,7,8,9,10]
assert dups == [1,1,1,3,3,2,7,8]
This code should solve the problem
List listOfDuplicates = inputList.clone()
listOfDuplicates.removeAll{
listOfDuplicates.count(it) == 1
}
The more the merrier:
groovy:000> list.groupBy().values()*.tail().flatten()
===> [1, 1, 1, 3, 3, 2, 7, 8]
Group by identity (this is basically a "frequencies" function).
Take just the values
Clip the first element
Combine the lists

Best way to shift a list in Python?

I have a list of numbers, let's say :
my_list = [2, 4, 3, 8, 1, 1]
From this list, I want to obtain a new list. This list would start with the maximum value until the end, and I want the first part (from the beginning until just before the maximum) to be added, like this :
my_new_list = [8, 1, 1, 2, 4, 3]
(basically it corresponds to a horizontal graph shift...)
Is there a simple way to do so ? :)
Apply as many as you want,
To the left:
my_list.append(my_list.pop(0))
To the right:
my_list.insert(0, my_list.pop())
How about something like this:
max_idx = my_list.index(max(my_list))
my_new_list = my_list[max_idx:] + my_list[0:max_idx]
Alternatively you can do something like the following,
def shift(l,n):
return itertools.islice(itertools.cycle(l),n,n+len(l))
my_list = [2, 4, 3, 8, 1, 1]
list(shift(my_list, 3))
Elaborating on Yasc's solution for moving the order of the list values, here's a way to shift the list to start with the maximum value:
# Find the max value:
max_value = max(my_list)
# Move the last value from the end to the beginning,
# until the max value is the first value:
while my_list[0] != max_value:
my_list.insert(0, my_list.pop())

python: Finding min values of subsets of a list

I have a list that looks something like this
(The columns would essentially be acct, subacct, value.):
1,1,3
1,2,-4
1,3,1
2,1,1
3,1,2
3,2,4
4,1,1
4,2,-1
I want update the list to look like this:
(The columns are now acct, subacct, value, min of the value for each account)
1,1,3,-4
1,2,-4,-4
1,3,1,-4
2,1,1,1
3,1,2,2
3,2,4,2
4,1,1,-1
4,2,-1,-1
The fourth value is derived by taking the min(value) for each account. So, for account 1, the min is -4, so col4 would be -4 for the three records tied to account 1.
For account 2, there is only one value.
For account 3, the min of 2 and 4 is 2, so the value for col 4 is 2 where account = 3.
I need to preserve col3, as I will need to use the value in column 3 for other calculations later. I also need to create this additional column for output later.
I have tried the following:
with open(file_name, 'rU') as f: #opens PW file
data = zip(*csv.reader(f, delimiter = '\t'))
# data = list(list(rec) for rec in csv.reader(f, delimiter='\t'))
#reads csv into a list of lists
#print the first row
uniqAcct = []
data[0] not in used and (uniqAcct.append(data[0]) or True)
But short of looping through and matching on each unique count and then going back through and adding a new column, I am stuck. I think there must be a pythonic way of doing this, but I cannot figure it out. Any help would be greatly appreciated!
I cannot use numpy, pandas, etc as they cannot be installed on this server yet. I need to use just basic python2
So the problem here is your data structure, it's not trivial to index.
Ideally you'd change it to something readible and keep it in those containers. However if you insist on changing it back into tuples I'd go with this construction
# dummy values
data = [
(1, 1, 3),
(1, 2,-4),
(1, 3, 1),
(2, 1, 1),
(3, 1, 2),
(3, 2, 4),
(4, 1, 1),
(4, 2,-1),
]
class Account:
def __init__(self, acct):
self.acct = acct
self.subaccts = {} # maps sub account id to it's value
def as_tuples(self):
min_value = min(val for val in self.subaccts.values())
for subacct, val in self.subaccts.items():
yield (self.acct, subacct, val, min_value)
def accounts_as_tuples(accounts):
return [ summary for acct_obj in accounts.values() for summary in acct_obj.as_tuples() ]
accounts = {}
for acct, subacct, val in data:
if acct not in accounts:
accounts[acct] = Account(acct)
accounts[acct].subaccts[subacct] = val
print(accounts_as_tuples(accounts))
But ideally, I'd keep it in the Account objects and just add a method that extracts the minimal value of the account when it's needed.
Here is another way using your initial approach.
Modify the way you import your data, so you can easily handle it in python.
import csv
mylist = []
with open(file_name, 'rU') as f: #opens PW file
data = csv.reader(f, delimiter = '\t')
for row in data:
splitted = row[0].split(',')
# this is in case you need integers
splitted = [int(i) for i in splitted]
mylist += [splitted]
Then, add the fourth column
updated = []
for acc in set(zip(*mylist)[0]):
acclist = [x for x in mylist if x[0] == acc]
m = min(i for sublist in acclist for i in sublist)
[l.append(m) for l in acclist]
updated += acclist

Pandas Dataframe ValueError: Shape of passed values is (X, ), indices imply (X, Y)

I am getting an error and I'm not sure how to fix it.
The following seems to work:
def random(row):
return [1,2,3,4]
df = pandas.DataFrame(np.random.randn(5, 4), columns=list('ABCD'))
df.apply(func = random, axis = 1)
and my output is:
[1,2,3,4]
[1,2,3,4]
[1,2,3,4]
[1,2,3,4]
However, when I change one of the of the columns to a value such as 1 or None:
def random(row):
return [1,2,3,4]
df = pandas.DataFrame(np.random.randn(5, 4), columns=list('ABCD'))
df['E'] = 1
df.apply(func = random, axis = 1)
I get the the error:
ValueError: Shape of passed values is (5,), indices imply (5, 5)
I've been wrestling with this for a few days now and nothing seems to work. What is interesting is that when I change
def random(row):
return [1,2,3,4]
to
def random(row):
print [1,2,3,4]
everything seems to work normally.
This question is a clearer way of asking this question, which I feel may have been confusing.
My goal is to compute a list for each row and then create a column out of that.
EDIT: I originally start with a dataframe that hase one column. I add 4 columns in 4 difference apply steps, and then when I try to add another column I get this error.
If your goal is add new column to DataFrame, just write your function as function returning scalar value (not list), something like this:
>>> def random(row):
... return row.mean()
and then use apply:
>>> df['new'] = df.apply(func = random, axis = 1)
>>> df
A B C D new
0 0.201143 -2.345828 -2.186106 -0.784721 -1.278878
1 -0.198460 0.544879 0.554407 -0.161357 0.184867
2 0.269807 1.132344 0.120303 -0.116843 0.351403
3 -1.131396 1.278477 1.567599 0.483912 0.549648
4 0.288147 0.382764 -0.840972 0.838950 0.167222
I don't know if it possible for your new column to contain lists, but it deinitely possible to contain tuples ((...) instead of [...]):
>>> def random(row):
... return (1,2,3,4,5)
...
>>> df['new'] = df.apply(func = random, axis = 1)
>>> df
A B C D new
0 0.201143 -2.345828 -2.186106 -0.784721 (1, 2, 3, 4, 5)
1 -0.198460 0.544879 0.554407 -0.161357 (1, 2, 3, 4, 5)
2 0.269807 1.132344 0.120303 -0.116843 (1, 2, 3, 4, 5)
3 -1.131396 1.278477 1.567599 0.483912 (1, 2, 3, 4, 5)
4 0.288147 0.382764 -0.840972 0.838950 (1, 2, 3, 4, 5)
I use the code below it is just fine
import numpy as np
df = pd.DataFrame(np.array(your_data), columns=columns)