Subsample rows with conditions in pandas

Subsample rows with conditions in pandas - python-2.7

I am trying to do in pandas something that I can do outside pandas (code below), but it's poorly readable.
Goal: subsample rows of a list of lists (or DataFrame) at a maximum interval of 10 rows or less, depending if the values from a "state" column change. In addition, this should be done separately for column values 'a' and 'b' of a dtype column.
Code to reproduce the intended output:
# input (list of list, but could be converted to DataFrame)
# columns: 1:index, 2:state, 3:dtype, 4:value.
x = [
[1, 0, 'b', 93.8],
[2, 0, 'b', 97.4],
[3, 0, 'b', 76.1],
[4, 0, 'b', 21.1],
[5, 0, 'b', 65.7],
[6, 0, 'b', 90.8],
[7, 0, 'b', 63.8],
[8, 0, 'b', 82.9],
[9, 0, 'b', 19.8],
[10, 0, 'b', 10.2],
[11, 0, 'b', 1.3],
[12, 1, 'b', 37.6],
[13, 0, 'b', 18.2],
[14, 0, 'b', 16.9],
[15, 0, 'b', 95.6],
[16, 1, 'b', 23.7],
[17, 0, 'b', 54.1],
[18, 0, 'b', 99.0],
[19, 0, 'b', 16.3],
[20, 0, 'a', 80.7],
[21, 0, 'a', 23.1],
[22, 0, 'a', 96.6],
[23, 0, 'a', 56.7],
[24, 0, 'a', 45.3],
[25, 1, 'a', 58.0],
[26, 0, 'a', 49.9],
[27, 0, 'a', 91.3],
[28, 0, 'b', 60.2],
[29, 0, 'b', 76.8],
[30, 0, 'b', 45.3],
[31, 0, 'b', 69.6],
[32, 0, 'b', 99.0],
[33, 0, 'b', 29.5],
[34, 0, 'b', 11.0],
[35, 0, 'b', 68.9],
[36, 0, 'b', 75.8],
[37, 1, 'b', 89.8],
[38, 0, 'b', 57.7],
[39, 1, 'b', 20.3],
[40, 0, 'b', 98.6],
[41, 0, 'b', 96.7],
[42, 0, 'b', 17.9],
[43, 1, 'b', 14.6],
[44, 0, 'b', 92.5],
[45, 0, 'b', 33.6],
[46, 1, 'b', 58.9],
[47, 1, 'b', 71.9],
[48, 0, 'b', 74.9],
[49, 0, 'b', 43.3],
[50, 1, 'b', 29.5],
[51, 0, 'b', 24.6],
[52, 0, 'b', 2.3],
[53, 0, 'b', 19.1],
[54, 0, 'b', 31.6],
[55, 0, 'b', 80.6],
[56, 0, 'b', 3.2],
[57, 0, 'b', 58.5],
[58, 1, 'b', 30.2],
[59, 1, 'b', 29.1],
[60, 0, 'b', 47.6],
[61, 0, 'b', 76.4],
[62, 0, 'b', 21.6],
[63, 0, 'b', 82.7],
[64, 0, 'b', 0.2],
[65, 0, 'b', 9.4],
[66, 0, 'b', 75.1],
[67, 0, 'b', 33.8],
[68, 0, 'b', 82.0],
[69, 0, 'b', 56.9],
[70, 0, 'b', 62.5],
[71, 0, 'b', 53.5],
[72, 0, 'b', 7.0],
[73, 0, 'a', 37.4],
[74, 0, 'a', 88.8],
[75, 0, 'a', 46.4],
[76, 0, 'a', 86.3],
[77, 0, 'a', 54.3],
[78, 0, 'b', 23.4],
[79, 0, 'b', 1.1],
[80, 0, 'b', 78.5],
[81, 0, 'b', 39.1],
[82, 1, 'b', 79.0],
[83, 0, 'b', 41.0],
[84, 0, 'b', 40.3],
[85, 0, 'a', 66.5],
[86, 0, 'a', 66.8],
[87, 0, 'a', 86.8],
[88, 1, 'b', 96.9],
[89, 0, 'b', 2.1],
[90, 0, 'b', 46.3],
[91, 0, 'b', 28.9],
[92, 0, 'b', 43.2],
[93, 0, 'b', 58.9],
[94, 0, 'b', 60.6],
[95, 0, 'b', 15.4],
[96, 0, 'b', 69.4],
[97, 1, 'b', 18.4],
[98, 0, 'b', 41.3],
[99, 0, 'b', 40.5]
]
]
Code to resample x for state 'a' and 'b':
def resample(x, log_interval, dtype):
if not x:
return
red = []
prev_state, next_val, last_val = 0, 0, 0
for row in x:
if row[2] == dtype:
if row[0] >= next_val or row[1] != prev_state and row[0] > last_val:
red.append(row)
prev_state = row[1]
next_val = row[0] + log_interval
last_val = row[0]
return red
red_a = resample(x, 10, 'a')
red_b = resample(x, 10, 'b')
And expected outcome for red_a and red_b:
red_a = [
[20, 0, a, 80.7],
[25, 1, a, 58.0],
[26, 0, a, 49.9],
[73, 0, a, 37.4],
[85, 0, a, 66.5]
]
red_b = [
[1, 0, b, 93.8],
[11, 0, b, 1.3],
[12, 1, b, 37.6],
[13, 0, b, 18.2],
[16, 1, b, 23.7],
[17, 0, b, 54.1],
[28, 0, b, 60.2],
[37, 1, b, 89.8],
[38, 0, b, 57.7],
[39, 1, b, 20.3],
[40, 0, b, 98.6],
[43, 1, b, 14.6],
[44, 0, b, 92.5],
[46, 1, b, 58.9],
[48, 0, b, 74.9],
[50, 1, b, 29.5],
[51, 0, b, 24.6],
[58, 1, b, 30.2],
[60, 0, b, 47.6],
[70, 0, b, 62.5],
[80, 0, b, 78.5],
[82, 1, b, 79.0],
[83, 0, b, 41.0],
[88, 1, b, 96.9],
[89, 0, b, 2.1],
[97, 1, b, 18.4],
[98, 0, b, 41.3]
]
How can I do this in pandas?
A starting point is:
columns = ['ind', 'state', 'dtype', 'value']
df = pd.DataFrame(x, columns=columns)
But if I try a for loop it is extremely slow (eg for row in df: ...).
Any idea how to proceed from here?

So starting with df = pd.DataFrame(x, columns=['ind', 'state', 'dtype', 'value']), at first you can create two DFs (df_a and df_b) selecting the states such as:
df_a = df[df['dtype'] =='a'].copy()
df_b = df[df['dtype'] =='b'].copy()
Then you create a function select_row that you will apply to these DFs:
def select_row( row, log_interval):
# using global varaibles might be a bit dangerous but I didn't find another way
global prev_state, next_val, last_val
# Here your conditions
if (row['ind'] >= next_val) or (row['state'] != prev_state and row['ind'] > last_val):
# change the values of the global variables
prev_state = row['state']
next_val = row['ind'] + log_interval
last_val = row['ind']
return True # return True if your condition is met
else: # return False otherwise
return False
Now you can create a column in df_a and df_b with a Boolean value such as:
log_interval = 10
prev_state, next_val, last_val = 0, 0, 0
df_a['bool'] = df_a.apply(select_row, args = ([log_interval ]), axis = 1)
#same for df_b but don't forget to reset your global values
prev_state, next_val, last_val = 0, 0, 0
df_b['bool'] = df_b.apply(select_row, args = ([log_interval ]), axis = 1)
Finally, you can create your two output by selecting the row of df_a (and df_b) having True in column 'bool' and drop this column:
red_a = df_a[df_a['bool'] == True].drop('bool',axis=1)
red_b = df_b[df_b['bool'] == True].drop('bool',axis=1)

Related

Pytorch tensor dimension multiplication

I'm trying to implement the grad-camm algorithm:
https://arxiv.org/pdf/1610.02391.pdf
My arguments are:
activations: Tensor with shape torch.Size([1, 512, 14, 14])
alpha values : Tensor with shape torch.Size([512])
I want to multiply each activation (in dimension index 1 (sized 512)) in each corresponding alpha value: for example if the i'th index out of the 512 in the activation is 4 and the i'th alpha value is 5, then my new i'th activation would be 20.
The shape of the output should be torch.Size([1, 512, 14, 14])

Assuming the desired output is of shape (1, 512, 14, 14).
You can achieve this with torch.einsum:
torch.einsum('nchw,c->nchw', x, y)
Or with a simple dot product, but you will first need to add a couple of additional dimensions on y:
x*y[None, :, None, None]
Here's an example with x.shape = (1, 4, 2, 2) and y = (4,):
>>> x = torch.arange(16).reshape(1, 4, 2, 2)
tensor([[[[ 0, 1],
[ 2, 3]],
[[ 4, 5],
[ 6, 7]],
[[ 8, 9],
[10, 11]],
[[12, 13],
[14, 15]]]])
>>> y = torch.arange(1, 5)
tensor([1, 2, 3, 4])
>>> x*y[None, :, None, None]
tensor([[[[ 0, 1],
[ 2, 3]],
[[ 8, 10],
[12, 14]],
[[24, 27],
[30, 33]],
[[48, 52],
[56, 60]]]])

Python: binary vector

I have a set of indices:
indices = (['1', '1.2', '2', '2.2', '3', '4'])
and a dataset, where the first element identifies a person, the second a round, and the third is the index from the indices set:
dataset = [['A', '1', '1'], ['A', '1', '1.2'], ['B', '1', '2'], ['C', '2', '3']]
I would like to form a binary vector, where for each person and for each individual round, the indices are marked either present (with a 1) or not (with a 0).
The desired output would be something like so, where for A, the vector represents the presence of the indices 1 and 1.2, for B, the index 2, and for C, the index 3. Note that for A, there is only one record, but 2 indices are present.
['A', '1', '1, 1, 0, 0, 0, 0']
['B', '1', '0, 0, 1, 0, 0, 0']
['C', '2', '0, 0, 0, 0, 1, 0']
I'm having a bit of trouble getting my head around the looping of the indices over the dataset. My idea was to loop through the indices set the same amount of time as the number of lists in the dataset. But I dont think this is the most efficient way, and any help would be appreciated!

I'd do it something like this:
from itertools import groupby
for k, g in groupby(dataset, lambda x: x[:2]):
vals = [x[2] for x in g]
print(k + [", ".join("1" if x in vals else "0" for x in indices)])
Output
['A', '1', '1, 1, 0, 0, 0, 0']
['B', '1', '0, 0, 1, 0, 0, 0']
['C', '2', '0, 0, 0, 0, 1, 0']
Is this what you were looking for?

Here's a solution without loops
import pandas as pd
indlist=['1', '1.2', '2', '2.2', '3', '4']
dataset = [['A', '1', '1'], ['A', '1', '1.2'], ['B', '1', '2'], ['C', '2', '3']]
df=pd.DataFrame(dataset,columns=['player','round','ind']).set_index('ind').reindex(indlist)
ans=df.reset_index().pivot('player','ind','round').fillna(0)[1:]

Integrate multiple dictionaries in django

Really having trouble about list these days, I'm a developer from php looking a place in python. My question is in relation to my previous question
I now have a dictionary group by id_position and flag that contains order [Top, Right, Bottom, Left, Center]:
a = {'41': [0, 0, 0, 0, 1], '42': [0, 0, 1, 0, 1], '43': [0, 0, 0, 0, 1], '44': [0, 0, 0, 0, 1]}
and other dictionary that contains my id_position and status:
b = {'44': 'statusC', '42': 'statusB', '41': 'statusA', '43': 'statusC'}
I want to include dict A in my code to save dict B below.
for pos, stat in B.items():
MyModel.objects.create(position=pos, status=stat, Top = "" , Right="" Bottom = "", Left= "")
How can I make this wok?
Can you recommend study list where I can start to work from php to django.
UPDATE:
I followed this and added my code below:
c = {}
for key in set().union(a, b):
if key in a: c.setdefault(key, []).extend(a[key])
if key in b: c.setdefault(key, []).extend(b[key])
print(c)
and it returned:
{
'42': [0, 0, 1, 0, 1, 's', 't', 'a', 't', 'u', 's', 'B'],
'41': [0, 0, 0, 0, 1, 's', 't', 'a', 't', 'u', 's', 'A'],
'44': [0, 0, 0, 0, 1, 's', 't', 'a', 't', 'u', 's', 'C'],
'43': [0, 0, 0, 0, 1, 's', 't', 'a', 't', 'u', 's', 'C']
}
my problem now is that my string is being separated

Not sure what you're looking for...
something like this?:
Top=A[int(pos)][0]
Right=A[int(pos)][1]
Bottom=A[int(pos)][2]
Left=A[int(pos)][3]
hope it helps.

How to replace values in a list at indexed positions?

I have following list of text positions with all values being set to '-999' as default:
List = [(70, 55), (170, 55), (270, 55), (370, 55),
(70, 85), (170, 85), (270, 85), (370, 85)]
for val in List:
self.depth = wx.TextCtrl(panel, -1, value='-999', pos=val, size=(60,25))
I have indexed list and corresponding values at them such as:
indx = ['2','3']
val = ['3.10','4.21']
I want to replace index locations '2' and '3' with values '3.10' and '4.21' respectively in 'List' and keep the rest as '-999'. Any suggestions?

Solved. I used following example:
>>> s, l, m
([5, 4, 3, 2, 1, 0], [0, 1, 3, 5], [0, 0, 0, 0])
>>> d = dict(zip(l, m))
>>> d #dict is better then using two list i think
{0: 0, 1: 0, 3: 0, 5: 0}
>>> [d.get(i, j) for i, j in enumerate(s)]
[0, 0, 3, 0, 1, 0]
from similar question.

Change the values of a list?

liste = [1,2,8,12,19,78,34,197,1,-7,-45,-97,-32,23]
liste2 = []
def repetisjon(liste,liste2):
for count in liste:
if count > 0:
liste2.append(1)
elif count < 0:
liste2.append(0)
return liste2
return (liste2)
print (repetisjon(liste,liste2))
The point is to change all the values of the list. If it's greater than or equal to 0, it is to be replaced by the value 1. And if it's lower than 0, it is to be replaced by 0. But I wasn't able to change the current list. The only solution I found was to make a new list. But is there anyway to CHANGE the current list without making a new one? I tried this as well, but didnt work at all:
liste = [4,8,43,4,78,24,8,45,-78,-6,-7,-3,8,-12,4,36]
def repe (liste):
for count in liste:
if count > 0:
count == 1
else:
count == 0
print (liste)
repe(liste)

Here, I replace the content of liste with the transformed data. since sameliste points to the same list, its value changes too.
>>> sameliste = liste = [1,2,8,12,19,78,34,197,1,-7,-45,-97,-32,23]
>>> sameliste
[1, 2, 8, 12, 19, 78, 34, 197, 1, -7, -45, -97, -32, 23]
>>> liste
[1, 2, 8, 12, 19, 78, 34, 197, 1, -7, -45, -97, -32, 23]
>>> liste[:] = [int(x >= 0) for x in liste]
>>> liste
[1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1]
>>> sameliste
[1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1]
>>>

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Subsample rows with conditions in pandas - python-2.7

Related

Pytorch tensor dimension multiplication

Python: binary vector

Integrate multiple dictionaries in django

How to replace values in a list at indexed positions?

Change the values of a list?

Categories

Resources